Google Cloud Storage

Google Cloud Storage (GCS)

Question: we have so many storage, why we still need GCS (Google Cloud Storage)?

Answer: to store unstructured data (file, image, video, etc.) and implement CDN. BigTable, BigQuery, ElasticSearch are all for structured data.

Definition: A Powerful, Simple and Cost Effective Object Storage Service.
Think it as Google Drive, Dropbox, Amazon S3.

Features

Durable

  • Google Cloud Storage is designed for 99.999999999% durability. It stores data redundantly, with automatic checksums to ensure data integrity. With Multi-Regional storage, your data is maintained in geographically distinct locations.

    Available

  • All storage classes offer very high availability around the world.

    Scalable

    Inexpensive

  • Good to store unstructured data like image, file, video, etc.

Content Delivery Network (CDN)

A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers.

CDN, offers an efficient, cost-effective way of reducing both network I/O costs and content delivery latency for regularly accessed website assets.

A CDN can be understood as group of geographically distributed caches, with each cache locaœted in one of several global points of presence.

Traditional server vs. CDN
4. Google Cloud Storage-2019-4-29-11-31-29.png
Companies like Akamai, Cloudflare, etc.

GCS Bucket

First of all, we will need a gcs bucket. Why we need this bucket? It’s like a folder which we can have files (images/videos posted from user) in it. Bucket == Folder.

Open your console.cloud.google.com and choose Storage -> Browser.

Then click ‘CREATE BUCKET’ to create a new bucket.

Pick a name for it, starts with post-images-, and the suffix is your project ID to avoid any conflict with other users. If used by others, add a random number. For example, I use post-images-75015 while yours will be different. Please remember this bucket name and we will use it later.

When asked about ‘Default storage class’, Regional is ok as our servers (GAE flex) are in us-central-, so we may put this bucket in us-central- too. Click ‘save’ to save your changes.

Multi part Form

A HTTP multipart request is a HTTP request that HTTP clients construct to send files and data over to a HTTP Server.

It is commonly used by browsers and HTTP clients to upload files to the server.

The content type “application/x-www-form-urlencoded“ is inefficient for sending large quantities of binary data or text containing non-ASCII characters.

The content type “multipart/form-data“ should be used for submitting forms that contain files, non-ASCII data, and binary data.

How to send multipart request in Postman?

4. Google Cloud Storage-2019-4-30-9-46-47.png

How does it look like?

4. Google Cloud Storage-2019-4-30-9-47-2.png

How to parse multipart form in Go?

1
func (r *Request) ParseMultipartForm(maxMemory int64) error

https://golang.org/pkg/net/http/#Request.ParseMultipartForm

https://github.com/golang-samples/http/blob/master/fileupload/main.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41


package main

import (
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
)

// 1MB
const MAX_MEMORY = 1 * 1024 * 1024

func upload(w http.ResponseWriter, r *http.Request) {
if err := r.ParseMultipartForm(MAX_MEMORY); err != nil {
log.Println(err)
http.Error(w, err.Error(), http.StatusForbidden)
}

for key, value := range r.MultipartForm.Value {
fmt.Fprintf(w, "%s:%s ", key, value)
log.Printf("%s:%s", key, value)
}

for _, fileHeaders := range r.MultipartForm.File {
for _, fileHeader := range fileHeaders {
file, _ := fileHeader.Open()
path := fmt.Sprintf("files/%s", fileHeader.Filename)
buf, _ := ioutil.ReadAll(file)
ioutil.WriteFile(path, buf, os.ModePerm)
}
}
}

func main() {
http.HandleFunc("/upload", upload)
http.Handle("/", http.FileServer(http.Dir("static")))
log.Fatal(http.ListenAndServe(":8080", nil))
}

Code changes

Question: Why we need GCS in this project?
Answer: to save the image uploaded from user and serve it.

Update handlerPost to support save image into GCS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import (
"context"
"cloud.google.com/go/storage"
// other libs
)
const (
INDEX = "around"
TYPE = "post"
DISTANCE = "200km"
PROJECT_ID = "around-75015"
BT_INSTANCE = "around-post"
// other configs

// Needs to update this bucket based on your gcs bucket name.
BUCKET_NAME = "post-images-75015"
)

type Post struct {
// `json:"user"` is for the json parsing of this User field. Otherwise, by default it's 'User'.
User string `json:"user"`
Message string `json:"message"`
Location Location `json:"location"`
Url string `json:"url"`
}

...

func handlerPost(w http.ResponseWriter, r *http.Request) {
// Other codes
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Header().Set("Access-Control-Allow-Headers", "Content-Type,Authorization")


// 32 << 20 is the maxMemory param for ParseMultipartForm, equals to 32MB (1MB = 1024 * 1024 bytes = 2^20 bytes)
// After you call ParseMultipartForm, the file will be saved in the server memory with maxMemory size.
// If the file size is larger than maxMemory, the rest of the data will be saved in a system temporary file.
r.ParseMultipartForm(32 << 20)

// Parse from form data.
fmt.Printf("Received one post request %s\n", r.FormValue("message"))
lat, _ := strconv.ParseFloat(r.FormValue("lat"), 64)
lon, _ := strconv.ParseFloat(r.FormValue("lon"), 64)
p := &Post{
User: "1111",
Message: r.FormValue("message"),
Location: Location{
Lat: lat,
Lon: lon,
},
}

id := uuid.New()

file, _, err := r.FormFile("image")
if err != nil {
http.Error(w, "Image is not available", http.StatusInternalServerError)
fmt.Printf("Image is not available %v.\n", err)
return
}
defer file.Close()

ctx := context.Background()

// replace it with your real bucket name.
_, attrs, err := saveToGCS(ctx, file, BUCKET_NAME, id)
if err != nil {
http.Error(w, "GCS is not setup", http.StatusInternalServerError)
fmt.Printf("GCS is not setup %v\n", err)
return
}

// Update the media link after saving to GCS.
p.Url = attrs.MediaLink

// Save to ES.
saveToES(p, id)

// Save to BigTable.
//saveToBigTable(p, id)

}

func saveToGCS(ctx context.Context, r io.Reader, bucketName, name string) (*storage.ObjectHandle, *storage.ObjectAttrs, error) {
// Student questions
}

Implement saveToGCS.

Google has provided a good example of writing objects to GCS. https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-go

Google example of open a client connection to GCS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package main

import (
"fmt"
"log"

// Imports the Google Cloud Storage client package.
"cloud.google.com/go/storage"
"golang.org/x/net/context"
)

func main() {
ctx := context.Background()

// Sets your Google Cloud Platform project ID.
projectID := "YOUR_PROJECT_ID"

// Creates a client.
client, err := storage.NewClient(ctx)
if err != nil {
log.Fatalf("Failed to create client: %v", err)
}

// Sets the name for the new bucket.
bucketName := "my-new-bucket"

// Creates a Bucket instance.
bucket := client.Bucket(bucketName)

// Creates the new bucket.
if err := bucket.Create(ctx, projectID, nil); err != nil {
log.Fatalf("Failed to create bucket: %v", err)
}

fmt.Printf("Bucket %v created.\n", bucketName)
}

Google example of writing an object to GCS (copied from https://github.com/GoogleCloudPlatform/golang-samples/blob/master/storage/objects/main.go
)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func write(client *storage.Client, bucket, object string) error {
ctx := context.Background()
// [START upload_file]
f, err := os.Open("notes.txt")
if err != nil {
return err
}
defer f.Close()

wc := client.Bucket(bucket).Object(object).NewWriter(ctx)
if _, err = io.Copy(wc, f); err != nil {
return err
}
if err := wc.Close(); err != nil {
return err
}
// [END upload_file]
return nil
}

Answer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Save an image to GCS.
func saveToGCS(ctx context.Context, r io.Reader, bucketName, name string) (*storage.ObjectHandle, *storage.ObjectAttrs, error) {
client, err := storage.NewClient(ctx)
if err != nil {
return nil, nil, err
}
defer client.Close()

bucket := client.Bucket(bucketName)
// Next check if the bucket exists
if _, err = bucket.Attrs(ctx); err != nil {
return nil, nil, err
}

obj := bucket.Object(name)
w := obj.NewWriter(ctx)
if _, err := io.Copy(w, r); err != nil {
return nil, nil, err
}
if err := w.Close(); err != nil {
return nil, nil, err
}


if err := obj.ACL().Set(ctx, storage.AllUsers, storage.RoleReader); err != nil {
return nil, nil, err
}

attrs, err := obj.Attrs(ctx)
fmt.Printf("Post is saved to GCS: %s\n", attrs.MediaLink)
return obj, attrs, err
}

Remember to install package:

go get -u cloud.google.com/go/storage

这里遇到了迷之bug error: elastic: Error 400 (Bad Request): failed to parse [type=mapper_parsing_exception] 问题出在postman上面, 重新开个post或者试试把url的类型改成plain text

Test

Local test (on your own computer)

cd to the folder where you have main.go

1
go run main.go

Open your Postman, change the method to POST, in the url enter ‘http://localhost:8080/post’

4. Google Cloud Storage-2019-4-30-14-5-10.png

In the Body part, choose form-data, and then enter lat, lon, message and image. For image, the type is ‘file’ such that you may upload an image from your local storage.

Verify the search API is working

Open Postman, change the method to GET, in the url change it to ‘http://localhost:8080/search?lat=37.5&lon=-120.5&range=200’

Copy the url into your browser and download the image to verify that it’s the same image that you’ve downloaded.

If you have any auth issue, try

1
gcloud auth application-default login

If you have permission problem, try

1
sudo chown -R $(whoami):staff ~/.config/gcloud/

Integration test (TBD)

Remote test (homework)

Commit your changes to github and then open a new cloud terminal.

Check out from github

1
git pull origin master

cd to your folder with main.go and app.yaml

1
gcloud app deploy

Wait until the deployment is done

Open Postman.

Instead of using ‘http://localhost:8080’ change it to your service url (without port) For example, ‘http://around-179500.appspot.com/post’ and ‘http://around-179500.appspot.com/search?lat=37.5&lon=-120.5&range=200’

The others are the same

Cors

cross-origin sharing standard

Pre-flight requests: You may have a content type like JSON, or some other custom header that’s triggering a pre-flight request, which your server may not be handling.

Try adding this one, if you’re using the ever-common AJAX in your front-end: https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Requested-With
Gorilla’s handlers.CORS() will set sane defaults to get the basics of CORS working for you;

however, you can (and maybe should) take control in a more functional manner.
https://stackoverflow.com/questions/40985920/making-golang-gorilla-cors-handler-work

(Optional) Vim go plugin:

https://github.com/fatih/vim-go

Spam and Abuse Detection

Any user-facing IT company has to address the Spam and Abuse problem in production.

Question: Why our project needs to think about spam problem?

Answer:
Categories of Spam and Abuse in Industry
Racy/Nudity (Child Porn especially)

  • Harassment or Hate Speech (n* word, etc.)
  • Fake News (Facebook)
  • Keyword Spam (Keyword stuffing)
  • Drug Abuse
  • Violence or Bully
  • Spam
  • Copycat (IP Infringement)
  • Phishing (CEO Phishing for SSN)
  • Privacy Leak

    Solutions

  • Keyword based (rule based)
    • n words, s words, f* words, etc.
  • Machine Learning Model (images and videos)
    • Open NSFW model
  • Geo location based
    • Strip clubs, etc.
  • User based
    • User ban
    • User warning
    • Aggregated user signals
  • User feedback
    • report
    • Machine Learning
    • Moderators
  • External channels
    • Internal users like employees
    • Media coverage
  • Others

Question:
In our current design, we kind of spam filters we may enforce?

Answer: For example, keyword based spam filter